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1. INTRODUCTION 


Using the digitized ground-truth inventories developed during Phase III, 
detailed analyses of the Classification and Mensuration Subsystem (CAMS) 
classification procedure may be performed. The purpose of the CAMS procedure 
is to determine the small -grains proportion in a segment. All of the process- 
ings used in this study were passed to the aggregation system as good process- 
ings, although some were not used in the aggregations. 

The CAMS classification procedure follows these steps: 

a. Two sets of dots are labeled as wheat or nonwheat by the analysts. 

b. Using one set of analyst-labeled dots (type 1 dots) as seed picture ele- 
ments (pixels), all of the pixels in the segment are grouped into clusters 
on the basis of their spectral values. 

c. Each of the clusters is labeled as wheat or nonwheat by the type 1 
analyst-labeled dot closest to the mean of the cluster. 

d. On the basis of the means and variances for each cluster, every pixel in 
the segment is classified as either wheat or nonwheat. 

e. Using the second set of analyst-labeled dots (type 2 dots) as a random 
sample of the segment, the machine classification proportion is corrected 
for any bias introduced by the classification process. 

The proportion of wheat in a segment can be estimated at four steps in the 
procedure: 

a. The type 2 dots can be used as a random sample of the segments to deter- 
mine a proportion. 

b. At the machine clustering stage, a proportion can be determined using the 
analyst label for each cluster. 

c. The machine classification proportion Is calculated using the CAMS 
procedures. 
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d. The bias-corrected machine proportion is calculated using CAMS procedures. 


If the procedure is effective, the proportion estimate should improve at each 
step. The CAMS procedures will be evaluated by calculating the proportion of 
small grains at each of these four steps: type 2 dots as a random sample, 
machine clusters, machine classification, and bias-corrected machine classi- 
fication. 

The results of these studies will be given for three groups: winter wheat 

segments, spring wheat segments, and mixed wheat segments. The winter wheat 
segments were those located in Colorado, Kansas, Nebraska, Oklahoma, and Texas 
and the spring wheat segments were in Minnesota and North Dakota. All of the 
segments in Montana and South Dakota were grouped as mixed wheat although some 
of these segments were processed as winter or spring wheat. 

When necessary to aggregate the pixels in a segment into small grains and non- 
small grains, winter wheat, spring wheat, barley, rye, flax, and oats were 
aggregated as small grains and all other crops were aggregated as nonsmall 
grains. 


2. CAMS CLASSIFICATION RESULTS 

Figures 1 through 4 show the errors in the estimates at each of the four stages 
in the CAMS procedure, using the last processing for each segment. The errors 
are plotted as a function of the true small -grains proportion for each segment. 
The general trend with all four of the estimates is an underestimation of the 
small-grains proportion, with the worst errors occurring for large small- 
grains proportions. 

The mean error and standard deviation (SD) of the mean error were calculated 
to quantize the errors. The mean error gives a measure of the bias of the 
estimator, and the SD is a measure of the variability. The mean square 
error (MSE), a measure of the overall performance of the estimator, was also 
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Error in proportion estimate, percent Error in proportion estimate, percent 



Ground -truth proportion, percent 


Figure 1.- Analyst-labeled type 2 
dots as random sample. 


Figure 2.- Machine clusters 
with analyst labels. 



Ground- truth proportion, percent 


Figure 3.- Machine 
classification. 


Figure 4.- Bias-corrected 
machine classification. 
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calculated. The mean error, SD, and MSE are shown in table 1; these results 
indicate that the estimate of small-grains proportion did not improve signifi- 
cantly from one step to the next. In all cases, the bias was approximately 
6 percent with an SD of approximately 10 percent. 

Another way of analyzing the procedure is to calculate the improvement (the 
difference in absolute value) in the error between any two steps. A positive 
improvement indicates that the error was less in the latter step than in the 
earlier step. The percentage of processings in which there was an improve- 
ment can also be calculated. If the step is effective, the percentage of 
processings improved should be greater than 50 percent, and the mean improve- 
ment should be greater than zero. These calculations for the CAMS results 
are shown in table 2. All of the comparisons Indicate very little improve- 
ment in the error in any of the steps; overall, about half the processings 
improved, and half the processings became worse. The mean improvement was 
less than 0.5 percent. 

In analyzing the differences between machine classification estimates and 
machine clustering estimates, the mean improvement was found to be 0.04 per- 
cent with an SD of the mean improvement of 0.46 percent. In performing a 
linear regression of the machine classification error against the machine 
clustering error, the slope was found to be 1.003 with an intercept of -0.185. 
The coefficient of determination for the regression was 0.9985. This result 
indicates that the classification results are essentially the same as the 
clustering results. A plot of the classification error as a function of 
clustering error is shown in figure 5. A pixel-level comparison was made 
between the classification results and the clustering results to investigate 
this relationship further. This comparison indicates that 96 percent of the 
pixels do not change their label from the clustering to the classification 
stage and that the average net change in pixel counts was only 0.3 percent. 
Indicating that the classification is unnecessary. 
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TABLE 1 CAMS CLASSIFICATION ERRORS 
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Figure 5.- Classification versus clusters. 
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2.1 CAMS CLASSIFICATION RESULTS USING GROUND-TRUTH DOT LABELING 


The bias and variability in the estimates produced by the CAMS procedure are 
caused by the procedure itself and by bad Input data in the form of mislabeled 
type 1 and type 2 dots. If one could reprocess the segments using the true 
labels for the type 1 and type 2 dots, any bias or variability in the results 
would be due to the procedure itself, and not to bad input data. 

Reprocessing all of the segments would be a big project; an easier way is to 
modify the CAMS results to reflect true dot labels instead of analyst labels. 
For the random-sample estimate using type 2 dots, it is a simple matter to 
replace the analyst labels with the true labels and recalculate the propor- 
tion. The clustering proportion is determined by aggregating the clusters 
on the basis of the analyst label for the dot closest to the mean of each 
cluster. The ground-truth clustering proportion can be determined by 
aggregating the clusters on the basis of the true dot label instead of the 
analyst dot label. It is not possible to reproduce the machine classifica- 
tion results using true labels, because means and variances of the clusters 
are used to classify the pixels. One does not have this information based 
on true labels. However, comparison of the classification results with the 
clustering results using analyst labels indicates that the results are 
identical. It can be assumed, therefore, that the classification results 
would be identical to the clustering result if true labels were used. The 
bias correction can be performed by comparing the ground-truth labels for 
type 2 dots with the label for the cluster in which the dot lies. The CAMS 
results can thus be reproduced by using ground-truth labels without reprocess- 
ing the segments. 

The CAMS results using ground-truth labels for type 1 and type 2 dots are 
shown in figures 6 through 8, which can be compared with the actual CAMS 
results in figures 1 through 4. The scatter in the error is much less using 
ground-truth labels, and there is no underestimation for large small -grains 
proportions. The clustering estimates have more variability than the random 
sample and bias-corrected estimates. The mean error, SD, and MSE for the CAMS 
results using ground-truth labels are shown in table 3. As could be expected. 
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Error in proportion estimate, percent Error in proporti 



Ground-truth proportion, percent 


Figure 6.- Ground-truth Figure 7.- Machine clusters 

labeled type 2 dots. with ground-truth labels. 



Ground -truth proportion, percent 


Figure 8.- Bias-corrected 
machine clusters. 
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TABLE 3.- CAMS CLASSIFICATION ERRORS FOR GROUND-TRUTH DOT LABELS 
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the clustering estimates have a great deal more variability than the random 
sample or bias-corrected estimates. The bias of the clustering estimate was 
less than 0.5 percent, Indicating that the classification Is essentially 
unbiased. The clustering does Increase the variability significantly. The 
bias correction reduces the errors to about the same level as for the random 
sample. 

Table 4 shows the relative Improvement between the three estimates. Cluster- 
ing made the estimate worse for 71 percent of the segments. The bias-corrected 
estimate was better than the random sample for 57 percent of the processings, 
but the mean Improvement was only 0.5 percent. 

The results using ground-truth dot labels indicate that the 6-percent negative 
bias and about half of the variability are due to analyst dot-labeling errors. 
The procedure Is capable of producing an unbiased estimate with an SO of 
-Sout 4 percent. 

2.2 ANALYST DOT-LABELING ACCURACY 

Because analyst dot-labeling errors are so Important, the analyst labeling 
accuracy was studied In detail. The labeling accuracy was determined for 
7677 type 1 dots and 12 037 type 2 dots. The dots used In this study were 
from all processings for each segment; classification results presented in 
previous sections were for only the last processing for each segment. 

Tables 5 and 6 show the analyst dot-labeling accuracy for type 1 and ype 2 
dots. The analysts labeled small-grains dots correctly about 61 percent of 
the time; the labeling accuracy for nonsmall grains was about 93 percent. 

In the strip-fallow categories, the dots were labeled as small grains about 
42 percent of the time. Because strip-fallow categories are half small grains 
and half nonsmall grains, the strip-fallow dots should be labeled as small 
grains 50 percent of the time. Therefore, the labeling accuracy for strip- 
fallow categories Is really 85 percent, which Is better than the 61-percent 
labeling accuracy for small grains. 
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TABLE 4.- IMPROVEMENT IN CAMS CLASSIFICATION FOR GROUND-TRUTH DOT LABELS 



























TABLE 5.- ANALYST DOT-LABELING ACCURACY FOR PHASE III PROCESSING 

TYPE 1 DOTS 


Classification 

Winter wheat 

Spring wheat 

Mixed wheat 

All categories 

No. of 
dots 

Correctly 

labeled, 

X 

No. of 

dots 

Correctly 

labeled, 

X 

No. of 
dots 

Correctly 

labeled, 

X 

No. of 
dots 

Correctly 

labeled, 

X 


Small grains 

Winter wheat 

483 

61 



75 

57 

558 

61 

Spring wheat 



432 

73 

140 

63 

572 

70 

Barley 



187 

75 

139 

38 

326 

60 

flax 



21 

24 

17 

6 

38 

16 

Oats 

25 

28 

152 

V* 

227 


404 

58 

Total small UMins 

508 

60 

792 



67 

598 


1898 

62 



Strip-fallow small 

grains* 





Winter wheat 

48 

35 



107 

46 

155 

43 

Spring wheat 



51 

37 

45 

47 

96 

42 

Barley 





21 

21 

21 

24 

Total strip-fallow small grains 

48 

35 

51 

37 

173 

43 

272 

41 

Nonsmall grains 

A1 fal'- 

49 

90 

106 

90 

151 

79 

306 

85 

Beans 

19 

95 





19 

95 

Corn 

159 

98 

193 

95 

225 

92 

577 

94 

Sunflower 



104 

98 



104 

98 

Sudan grass 

10 

90 



12 

100 

22 

95 

Sorghum 

178 

92 



26 

100 

204 

93 

Soybeans and guar 

40 

100 

S6 

100 

11 

82 

137 

99 

Sugar beets 



27 

93 

14 

100 

41 

95 

Grass 

47 

98 

67 

94 

125 

90 

239 

93 

Hay 

25 

88 

63 

89 

116 

83 

204 

85 

Pasture 

933 

97 

354 

92 

1218 

96 

2505 

96 

Trees 

27 

8$ 

42 

88 

41 

100 

1)0 

92 

Cotton 

32 

97 





32 

97 

Water 

27 

100 

80 

100 

86 

100 

193 

100 

Nonagricul tural 

37 

100 

40 

98 

39 

97 

116 

98 

Homestead 

51 

98 

22 

91 

45 

69 

118 

86 

Idle cropland - stubble 

13 

85 



12 

92 

25 

88 

Idle cropland - cover crop 

10 

90 





10 

90 

Idle cropland - residue 

33 

94 



16 

100 

49 

96 

Idle cropland - fallow 

190 

95 

139 

94 

167 

93 

496 

94 

Total nonsmall grains 

1880 

96 

1323 

94 

2304 

93 

5507 

94 


*The percent correctly labeled for strip-fallow assumes that small grains is the correct label. 
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TABLE 6.- ANALYST DOT-LABELING ACCURACY FOR PHASE III PROCESSING - 

TYPE 2 DOTS 


Classification 

Winter wheat 

Spring wheat 

Mixed wheat 

All categories 

No. Of 
dots 

Correctly 

labeled. 

No. of 
dots 

Correctly 

labeled, 

No. of 
dots 

Correctly 

labeled, 

m 

No. of 
dots 

Correctly 

labeled, 

I 

Small grains 

Winter wheat 

712 

61 



149 

55 

861 

60 

Spring wheat 



738 

68 

217 

56 

955 

66 

Barley 



282 

70 

210 

40 

492 

57 

Rye 





16 

38 

16 

38 

Flax 



27 

11 

23 

30 

50 

20 

Oats 

32 

19 

281 

59 

440 

59 

753 

58 

Total small grains 

744 

59 

1328 

66 

1055 

53 

3127 

60 



Strip-fallow small 

grains 4 





Winter wheat 

86 

36 



179 

54 

277 

47 

Spring wheat 



75 

32 

107 

41 

182 

37 

Barley 





69 

38 

69 

38 

Total strip-fallow small grains 

86 

36 

75 

32 

355 

47 

528 

43 

Nonsmall grains 

Alfalfa 

53 

81 

159 

89 

264 

78 

476 

82 

Beans 



11 

91 



11 

91 

Corn 

220 

97 

228 

93 

366 

92 

814 

94 

Sunflower 



170 

94 

29 

93 

199 

94 

Sudan grass 

14 

86 

10 

100 

11 

100 

35 

94 

Sorghum 

291 

95 



55 

95 

346 

95 

Soybeans and guar 

51 

82 

105 

94 



156 

90 

Sugar beets 



41 

93 



41 

93 

Grass 

65 

86 

120 

88 

217 

89 

402 

83 

Hay 

53 

98 

76 

89 

188 

90 

317 

91 

Pasture 

1271 

96 

478 

95 

1993 

93 

3742 

95 

Trees 

46 

96 

77 

81 

95 

96 

218 

90 

Cotton 

57 

81 





57 

81 

Millet 





14 

79 

14 

79 

Water 

y 

100 

95 

100 

86 

100 

217 

100 

Nonagricultural 

SB 

97 

55 

93 

69 

100 

182 

97 

Ho'iestead 

63 

96 

48 

60 

84 

85 

200 

83 

Idle cropland - stubble 

22 

91 



11 

91 

33 

91 

Idle cropland - cover crop 

10 

90 

12 

100 



22 

95 

Idle cropland - residue 

25 

100 



44 

95 

69 

97 

Idle cropland - fallow 

34J 

91 

244 

81 

244 

94 

831 

59 

Total nonsmall grains 

2683 

94 

1929 

90 

3770 

92 

8382 

92 


4 The percent correctly libeled for strip-fallow assumes that small grains is the correct label. 
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These results are consistent with the underestimation of the small -grains 
proportion by the CAMS procedure. The analyst does a good job of labeling 
nonsmall -grains pixels, but mislabels many of the small-grains pixels. 

The accuracy for labeling type 1 dots is slightly better than for type 2 
dots, probably because type 1 dots are not labeled if they fall on field 
boundaries, whereas type 2 dots are labeled regardless of where they fall. 

The CAMS procedure allows the analyst to change the labels of type 2 dots 
after the machine classification has been performed. Table 7 shows a com- 
parison of the proportion errors for those segments in which type 2 dot labels 
were changed. There was an overall improvement in the errors when the 
relabeled dots were used, but in the mixed wheat segments, the errors became 
worse. To investigate this problem further, the improvement in dot labeling 
accuracy was calculated for those processings where dot labels were changed; 
the results of these calculations are shown in table 8. The overall improve- 
ment in labeling small -grains dots was 4 percent. In the strip-fallow and 
nonsmall -grains categories, the improvement was 1 percent; in the mixed wheat 
segments, the small -grains accuracy went down by 2 percent and the nonsmall - 
grains accuracy w»*nt up by 3 percent. The less accurate labeling of small 
grains coupled with the more accurate labeling of nonsmall-grains caused the 
increased proportion errors observed in the mixed wheat segment. 

2.3 ANALYSIS OF CLUSTERING EFFECTIVENESS 

In the CAMS results using ground-truth dot labels, clustering increased the 
variability of the estimate from 4 to 7 percent. To investigate this problem, 
the cluster purity was calculated for all clusters of all processings. A 
histogram of cluster purity is given in figure 9. The number of clusters 
with a given small-grains proportion is plotted as a function of the small - 
grains proportion within the cluster. Ideally, this histogram would show a 
maxirmmi value near zero purity to reflect clustering of nonsmall grains, a 
second maximum near 100-percent purity to reflect clustering of small grains, 
and a minimum near 50 percent. The results for procedure 1 clustering show 
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TABLE 8.- IMPROVEMENT IN ANALYST DOT-LABELING ACCURACY FOR PHASE III PROCESSING 
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clusters, percent 



a large number of pure nonsmall -grains clusters, but there are very few pure 
small-grains clusters. These results show that the clustering does not 
separate the small grains from the nonsmall grains. 

Each cluster is labeled by the dot closest to the cluster mean. If a small — 
grains cluster is defined as a cluster with more than 50 percent small grains, 
the labeling logic correctly labels the small-grains cluster 70 percent of 
the time, based on the analyst dot labels. The nonsmall -grains clusters are 
labeled correctly 91 percent of the time. If ground-truth labels are used 
instead of the analyst labels, the small-grains clusters were labeled cor- 
rectly 80 percent of the time, while the nonsmall -grains clusters were cor- 
rectly labeled 83 percent of the time. This indicates that the labeling 
logic is nearly as effective on small-grains clusters as on nonsmall -grains 
clusters. 


3. CONCLUSIONS 

Based on these studies, the following conclusions are reached: 

a. The CAMS proportion estimates have a bias of -6 percent with a standard 
deviation of 10 percent. 

b. The -6 percent bias and half of the standard deviation are caused by 
analyst dot-labeling errors. 

c. If the dot labeling were completely accurate, the proportion estimates 
would be unbiased with a standard deviation of 4 percent. 

d. The proportions based on the type 2 dots as a random sample produce as 
good an estimate as the final bias-corrected result. 

e. The proportion estimate produced by the machine classification is identical 
to the estimate produced by clustering; therefore, machine classification 
is nonproductive. 

f. The -6 percent bias is due to the analysts' labeling nonsmall -grains dots 
quite well, while mislabeling a large portion of the small-grains dots. 
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g. Relabeling the type 2 dots improved the proportion estimates overall, but 
produced worse estimates in mixed wheat states. 

h. Machine clustering does not effectively separate small grains from the 
nonsmall grains (corn, soybeans, grasses, trees, etc.). 

i. The greatest improvement in results would be produced by improving the 
analyst dot-labeling accuracy. 

j. A significant improvement in results would be produced with better 
clustering. 
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